Dimensionality Reduction via Matrix Factorization for Predictive Modeling from Large, Sparse Behavioral Data

نویسنده

  • Jessica Clark
چکیده

Matrix factorization is a popular technique for engineering features for use in predictive models; it is viewed as a key part of the predictive analytics process and is used in many different domain areas. The purpose of this paper is to investigate matrix-factorization-based dimensionality reduction as a design artifact in predictive analytics. With the rise in availability of large amounts of sparse behavioral data, this investigation comes at a time when traditional techniques must be reevaluated. Our contribution is based on two lines of inquiry: we survey the literature on dimensionality reduction in predictive analytics, and we undertake an experimental evaluation comparing using dimensionality reduction versus not using dimensionality reduction for predictive modeling from large, sparse behavioral data. Our survey of the dimensionality reduction literature reveals that, despite mixed empirical evidence as to the benefit of computing dimensionality reduction, it is frequently applied in predictive modeling research and application without either comparing to a model built using the full feature set or utilizing state-of-the-art predictive modeling techniques for complexity control. This presents a concern, as the survey reveals complexity control as one of the main reasons for employing dimensionality reduction. This lack of comparison is troubling in light of our empirical results. We experimentally evaluate the efficacy of dimensionality reduction in the context of a collection of predictive modeling problems from a large-scale published study. We find that utilizing dimensionality reduction improves predictive performance only under certain, rather narrow, conditions. Specifically, under default regularization (complexity control) settings dimensionality reduction helps for the more difficult predictive problems (where the predictive performance of a model built using the original feature set is relatively lower), but it actually decreases the performance on the easier problems. More surprisingly, employing state-ofthe-art methods for selecting regularization parameters actually eliminates any advantage that dimensionality reduction has! Since the value of building accurate predictive models for business analytics applications has been well-established, the resulting guidelines for the application of dimensionality reduction should lead to better research and managerial decisions. 2 Clark and Provost: Dimensionality Reduction for Predictive Modeling Working paper CBA-15-02

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Relevance Determination in Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) has become a popular technique for data analysis and dimensionality reduction. However, it is often assumed that the number of latent dimensions (or components) is given. In practice, one must choose a suitable value depending on the data and/or setting. In this paper, we address this important issue by using a Bayesian approach to estimate the latent dime...

متن کامل

Fast and Unique Tucker Decompositions via Multiway Blind Source Separation

A multiway blind source separation (MBSS) method is developed to decompose large-scale tensor (multiway array) data. Benefitting from all kinds of well-established constrained low-rank matrix factorization methods, MBSS is quite flexible and able to extract unique and interpretable components with physical meaning. The multilinear structure of Tucker and the essential uniqueness of BSS methods ...

متن کامل

A social recommender system based on matrix factorization considering dynamics of user preferences

With the expansion of social networks, the use of recommender systems in these networks has attracted considerable attention. Recommender systems have become an important tool for alleviating the information that overload problem of users by providing personalized recommendations to a user who might like based on past preferences or observed behavior about one or various items. In these systems...

متن کامل

Topics in learning sparse and low-rank models of non-negative data

Advances in information and measurement technology have led to a surge in prevalence of high-dimensional data. Sparse and low-rank modeling can both be seen as techniques of dimensionality reduction, which is essential for obtaining compact and interpretable representations of such data. In this thesis, we investigate aspects of sparse and low-rank modeling in conjunction with non-negative data...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015